Assessment of Stability in Partitional Clustering Using Resampling Techniques

نویسنده

Hans-Joachim Mucha

چکیده

The assessment of stability in cluster analysis is strongly related to the main difficult problem of determining the number of clusters present in the data. The latter is subject of many investigations and papers considering different resampling techniques as practical tools. In this paper, we consider non-parametric resampling from the empirical distribution of a given dataset in order to investigate the stability of results of partitional clustering. In detail, we investigate here only the very popular K-means method. The estimation of the sampling distribution of the adjusted Rand index (ARI) and the averaged Jaccard index seems to be the most general way to do this. In addition, we compare bootstrapping with different subsampling schemes (i.e., with different cardinality of the drawn samples) with respect to their performance in finding the true number of clusters for both synthetic and real data.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Detailed Study and Analysis of different Partitional Data Clustering Techniques

The concept of Data Clustering is considered to be very significant in various application areas like text mining, fraud detection, health care, image processing, bioinformatics etc. Due to its application in a variety of domains, various techniques are presented by many research domains in the literature. Data Clustering is one of the important tasks that make up Data Mining. Clustering can be...

متن کامل

A Comparative Study of Clustering Methods with Multinomial Distribution

In this paper, we study different discrete data clustering methods, which use the Model-Based Clustering (MBC) framework with the Multinomial distribution. Our study comprises several relevant issues, such as initialization, model estimation and model selection. Additionally, we propose a novel MBC method by efficiently combining the partitional and hierarchical clustering techniques. We conduc...

متن کامل

C ONSTRAINT BASED P ARTITIONAL C LUSTERING – A C OMPREHENSIVE S TUDY AND A NALYSIS Aparna

Data clustering is the concept of forming predefined number of clusters where the data points within each cluster are very similar to each other and the data points between clusters are dissimilar to each other. The concept of clustering is widely used in various domains like bioinformatics, medical data, imaging, marketing study and crime analysis. The popular types of clustering techniques ar...

متن کامل

Pre Processing Techniques for Arabic Documents Clustering

Clustering of text documents is an important technique for documents retrieval. It aims to organize documents into meaningful groups or clusters. Preprocessing text plays a main role in enhancing clustering process of Arabic documents. This research examines and compares text preprocessing techniques in Arabic document clustering. It also studies effectiveness of text preprocessing techniques: ...

متن کامل

Informal version for personal use Scalable Clustering

2 Clustering Techniques: A Brief Survey 4 2.1 Partitional Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2 Hierarchical Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.3 Discriminative vs. Generative Models . . . . . . . . . . . . . . . . . 12 2.4 Assessment of Results . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.4.1 Internal (model-based, unsup...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2017

Assessment of Stability in Partitional Clustering Using Resampling Techniques

نویسنده

چکیده

منابع مشابه

A Detailed Study and Analysis of different Partitional Data Clustering Techniques

A Comparative Study of Clustering Methods with Multinomial Distribution

C ONSTRAINT BASED P ARTITIONAL C LUSTERING – A C OMPREHENSIVE S TUDY AND A NALYSIS Aparna

Pre Processing Techniques for Arabic Documents Clustering

Informal version for personal use Scalable Clustering

عنوان ژورنال:

اشتراک گذاری